Differential Eligibility Vectors for Advantage Updating and Gradient Methods
نویسنده
چکیده
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of each action in the TD-error at each state. Specifically, we use DEV in TD-Q(λ) to more accurately learn the relative value of the actions, rather than their absolute value. We identify conditions that ensure convergence w.p.1 of TD-Q(λ) with DEV and show that this algorithm can also be used to directly approximate the advantage function associated with a given policy, without the need to compute an auxiliary function – something that, to the extent of our knowledge, was not known possible. Finally, we discuss the integration of DEV in LSTDQ and actor-critic algorithms.
منابع مشابه
Reinforcement Learning Applied to a Differential Game
An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a p...
متن کاملAdvantage Updating Applied to a Differrential Game
An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual gradient form of advantage updating. The game is a Markov Decision Process (MDP) with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile a...
متن کاملGradient-based Ant Colony Optimization for Continuous Spaces
A novel version of Ant Colony Optimization (ACO) algorithms for solving continuous space problems is presented in this paper. The basic structure and concepts of the originally reported ACO are preserved and adaptation of the algorithm to the case of continuous space is implemented within the general framework. The stigmergic communication is simulated through considering certain direction vect...
متن کاملGradient-based Ant Colony Optimization for Continuous Spaces
A novel version of Ant Colony Optimization (ACO) algorithms for solving continuous space problems is presented in this paper. The basic structure and concepts of the originally reported ACO are preserved and adaptation of the algorithm to the case of continuous space is implemented within the general framework. The stigmergic communication is simulated through considering certain direction vect...
متن کاملDynamic anomaly detection by using incremental approximate PCA in AODV-based MANETs
Mobile Ad-hoc Networks (MANETs) by contrast of other networks have more vulnerability because of having nature properties such as dynamic topology and no infrastructure. Therefore, a considerable challenge for these networks, is a method expansion that to be able to specify anomalies with high accuracy at network dynamic topology alternation. In this paper, two methods proposed for dynamic anom...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011